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q . Abstract 

(N 

Data streaming transmission over a block fading channel is studied. It is assumed that the transmitter 
receives a new message at each channel block at a constant rate, which is fixed by an underlying 
application, and tries to deliver the arriving messages by a common deadline. Various transmission 
schemes are proposed and compared with an informed transmitter upper bound in terms of the average 
decoded rate. It is shown that in the single receiver case the adaptive joint encoding (aJE) scheme is 
asymptotically optimal, in that it achieves the ergodic capacity as the transmission deadline goes to 
infinity; and it closely follows the performance of the informed transmitter upper bound in the case of 
finite transmission deadline. On the other hand, in the presence of multiple receivers with different signal- 
to-noise ratios (SNR), memoryless transmission (MT), time sharing (TS) and superposition transmission 
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(ST) schemes are shown to be more robust than the joint encoding (JE) scheme as they have gradual 
00 
fNJ ■ performance loss with decreasing SNR. 
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I. Introduction 

In a streaming transmitter data becomes available over time rather than being available at the 
beginning of transmission. Consider, for example, digital TV satellite broadcasting. The satellite 
receives video packets from a gateway on Earth at a fixed data rate and has to forward the received 
packets to the users within a certain deadline. Hence, the transmission of the first packet starts 
before the following packets arrive at the transmitter. We consider streaming transmission over 
a block fading channel with channel state information (CSI) available only at the receiver. This 
assumption results from practical constraints when the receiver belongs to a large population 
of terminals receiving a broadcast transmission, or when the transmission delay is significantly 
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Fig. 1. The transmitter receives message Wi of rate R at the beginning of channel block i. All the M messages need to be 
transmitted to the receiver by the end of channel block M. 

larger than the channel coherence timdj ED. The data that arrives at the transmitter over a 
channel block can be modeled as an independent message whose rate is fixed by the quality 
of the gateway-satellite link and the video encoding scheme used for recording the event. We 
assume that the transmitter cannot modify the contents of the packets to change the data rate. This 
follows from the practical fact that the satellite transmitter is oblivious to the underlying video 
coding scheme adopted by the source, and considers the accumulated data over each channel 
block coherence time as a single data packet that can be either transmitted or dropped. 

We further impose a delay constraint on the transmission such that the receiver buffers the 
received messages for M channel blocks before displaying the content, which is typical of 
multimedia streaming applications (see Fig. U). As the messages arrive at the transmitter gradually 
over M channel blocks, the last message sees only a single channel realization, while the first 
message can be transmitted over the whole span of M channel blocks. For a finite number M 
of messages and M channel blocks, it is not possible to average out the effect of fading in the 
absence of CSI at the transmitter, and there is always a non-zero outage probability [3|. Hence, 
the performance measure we study is the average decoded data rate by the user. 

Communication over fading channels has been extensively studied [4J. The capacity of a 
fading channel depends on the available information about the channel behavior [5]. When both 

'Transmission rate can be adjusted to the channel state through adaptive coding and modulation (ACM) driven by a feedback 
channel. However, in real-time broadcast systems with large delays and many receivers, such as satellite systems, this is not 
practical. For instance, according to fl] (Section 4.5.2.1) in real-time video transmission the ACM bit-rate control-loop may drive 
the source bit-rate (e.g., variable bit rate video encoder), but this may lead to a large delay (hundreds of milliseconds) in executing 
rate variation commands. In such cases the total control loop delay is too large to allow real time compensation of fading. 
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the transmitter and the receiver have CSI, the capacity is achieved though waterfilling [6J. This 
is called the ergodic capacity as the capacity is averaged over the fading distribution. In the case 
of a fast fading channel without CSI at the transmitter ergodic capacity is achieved with constant 
power transmission [4j. However, when there is a delay requirement on the transmission as in our 
model, and the delay constraint is short compared to the channel coherence time, we have a slow 
fading channel. In a slow-fading channel, if only the receiver can track the channel realization, 
outage becomes unavoidable |3). An alternative performance measure in this case is the e-outage 
capacity [7]. In general it is hard to characterize the outage capacity exactly; hence, many works 
have focused on the high SNR [8] or the low SNR [9] asymptotic regimes. Another approach, 
which is also adopted in this work, is to study the average transmission rate as in ifTOl and |fl"Tfl. 
Outages may occur even if the transmitter has access to CSI if it is required to sustain a constant 
transmission rate at all channel states. This can be due to the short-term power constraint, when 
the channel quality is so poor that the maximum power available is not sufficient to transmit 
the message reliably at the required rate [fl2l : or, when the average power is not sufficient to 
sustain a constant rate at all channel conditions, which is called the delay-limited capacity |[T3l . 
Due to the constant rate of the arriving messages at all channel blocks our problem is similar 
to the delay-limited capacity concept. However, here we neither assume CSI at the transmitter 
nor require all arriving messages to be transmitted. Our work also differs from the average rate 
optimization in jTOl since the transmitter in IfTOl can adapt the transmission rate based on the 
channel characteristics and the delay constraint, whereas in our model the message rate is fixed 
by the underlying application. The degree-of-freedom the transmitter has in our setting is the 
multiple channel blocks it can use for transmitting the messages while being constrained by the 
causal arrival of the messages and the total delay constraint of M blocks. 

Data streaming has received significant attention recently. Most of the work in this area 
focus on practical code construction [[141 . [031 . lfT6l . More similar to our work, [fTTl studies 
the diversity-multiplexing tradeoff in a streaming transmission system with a maximum delay 
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constraint for each message. Unlike in [17], we assume that the whole set of messages has a 
common deadline; hence, in our setting the degree-of-freedom available to the first message is 
higher than the one available to the last. 

In the present paper we extend our work in lfl8l by presenting analytical results and introducing 
more effective transmission schemes. We first study joint encoding (JE) which encodes all the 
available messages into a single codeword at each channel block. We also study time-sharing (TS) 
and superposition (ST) schemes. The main contributions of the present work can be summarized 
as follows: 

1) We introduce a channel model for streaming transmitter over block fading channels with a 
common decoding deadline to study real-time multimedia streaming in networks with large 
delays. 

2) We introduce an informed transmitter upper bound on the performance assuming the avail- 
ability of perfect CSI at the transmitter. 

3) We show that a variant of the JE scheme, called the adaptive joint encoding (aJE) scheme, 
performs very close to the informed transmitter upper bound for a finite number of messages, 
and approaches the ergodic capacity as the number of channel blocks goes to infinity. 

4) We show that the JE scheme has a phase transition behavior, which makes it unsuitable 
for networks with multiple receivers having different average SNRs. As an alternative, we 
propose the TS and ST schemes, whose performance degrade gradually with the decreasing 
average SNR. 

We support our analytical results with extensive numerical simulations. The rest of the paper is 
organized as follows. In Section [II] we describe the system model. In Section [111] we describe 
the proposed transmission schemes in detail. In Section [IV] we provide an informed transmitter 
upper bound on the average decoded rate, while Section [V]is devoted to the numerical results. 
Finally, Section [VI] contains the conclusions. 
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II. System Model 

We consider streaming transmission over a block fading channel. The channel is constant 
for a block of n channel uses and changes in an independent and identically distributed (i.i.d.) 
manner from one block to the next. We assume that the transmitter accumulates the data that 
arrives at a fixed rate during a channel block, and considers the accumulated data as a single 
message to be transmitted during the following channel blocks. We consider streaming of M 
messages over M channel blocks, such that message W t becomes available at the beginning of 
channel block t, t = 1, . . . , M (see Fig. U). Each message W t has rate R bits per channel use 
(bpcu), i.e., Wt is chosen randomly with uniform distribution from the set Wt = {1, . . . , 2 nR }, 
where n is the number of channel uses per channel block. Following a typical assumption in 
the literature (see, e.g., H10J), we assume that n, though still large (as to give rise to the notion 
of reliable communication lTi9l ). is much shorter than the dynamics of the slow fading process. 
The channel in block t is given by 

y[t]=h[t]x[t]+z[t], (1) 

where h[t] G C is the channel state, x[t] G C n is the channel input, z[t] G C n is the i.i.d. 
unit- variance Gaussian noise, and y[t] G C n is the channel output. The instantaneous channel 
gains are known only at the receiver. We have a short-term average power constraint of P, i.e., 
£7[x[t]x[t]'] < nP for t = 1,...,M, where x[£]t represents the Hermitian transpose of x[t] 
and E\x\ is the mean value of x. The short-term power constraint models the restriction on the 

n 

maximum power radiated by the transmitter which is present in many practical systems q 

The channel from the source to the receiver can be seen as a multiple access channel (MAC) 
with a special message hierarchy [|22l . in which the encoder at each channel block acts as a 

2 In cellular systems, for instance, the maximum power emitted by the transmitter is generally bounded in order to limit 
the interference to neighbor cells and keep it under a threshold value 1 20]. In satellite systems broadcasting multimedia traffic 
the onboard high power amplifier is generally driven to the limit of saturation in order to optimize the cost of the system by 
providing the maximum output power under given distortion constraints (|21|, Section 9.2). 
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Fig. 2. Equivalent channel model for the sequential transmission of M messages over M channel blocks to a single receiver. 



separate virtual transmitter (see Fig. |2), and the receiver tries to decode as many of the messages 

as possible. Our performance measure is the average decoded rate. We denote the instantaneous 

channel capacity over channel block t by C t — log 2 (l + 4>\t]P), where (f>[t\ is a random variable 

distributed according to a generic probability density function (pdf) /$(</»). Note that C t is also 

a random variable. We define C — E[log 2 (l + <I>P)], where the expectation is taken over /$(</>). 

C is the ergodic capacity of this channel when there is no delay constraint on the transmission. 

III. Transmission Schemes 

The most straightforward transmission scheme is to send each message only within the channel 
block following its arrival. This is called memoryless transmission (MT). Due to the i.i.d. nature 
of the channel over blocks, successful decoding probability is constant over messages. Denoting 
this probability by p = Pr {C t > R}, the probability that exactly m messages are decoded is 



77(777.) 



m 



p m (l-p) 



M-rn 



(2) 



Note that we have a closed-form expression for 77(777.), and it can be further approximated with 
a Gaussian distribution if we let M go to infinity, i.e., 



7/(777) 



1 



(m-Mp) 2 
_g _ 2Mp(l-p) 



(3) 



*.M 



^2nMp(l-p) 
The average decoded rate of the MT scheme Rmt is found by evaluating Y^m=i mr /( m )- The 
MT scheme treats all messages equally. However, depending on the average channel conditions, 
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Fig. 3. Total decoded rate regions in the (Ci,C2) with M — 2 messages for MT (on the left) and JE (on the right) schemes. 

it might be more beneficial to allocate more resources to some of the messages in order to 
increase the average decoded rate. In the following, we will consider three basic transmission 
schemes based on the type of resource allocation used. We will find the average decoded rate for 
these schemes and compare them with an upper bound that will be introduced in Section JVJ 
A. Joint Encoding Transmission 

In the joint encoding (JE) scheme we generate a single multiple-index codebook for each chan- 
nel block. For channel block t, we generate a t dimensional codebook of size s\ X • • • X s t , s, = 
2 nR ) Vi e {1, . . . , £}, with Gaussian distribution, and index the codewords as x t (Wi, . . . , W t ) 
where Wi 6 W = {1, . . . , 2 nR } for i = 1, . . . ,t. The receiver uses joint typicality decoder and 
tries to estimate as many messages as possible at the end of block M. With high probability, it 
will be able to decode the first m messages correctly if 



[m-j 



+ l)R<J2Ct, V j = l,2,...,m. 



(4) 



t=j 



As a comparison, we illustrate the achievable rate regions for MT and JE schemes for M = 2 
in Fig. [3j In the case of MT, a total rate of 2R can be decoded successfully if both capacities 
C\ and C 2 are above R. We achieve a total rate of R if only one of the capacities is above R. 
On the other hand, in the case of joint encoding, we tradeoff a part of the region of rate R for 
rate 2R; that is, we achieve a rate of 2R instead of rate R, while rate is achieved rather than 
rate R in the remaining region. 
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Using the conditions in © we define functions f m (R), for m — 0, 1, . . . , M, as follows: 

1, if (m-j + 1)R < YZi a, J = 1, ••• , m, 

0, otherwise. 
Then the probability of decoding exactly m messages can be written as, 

rj( m ) = Pr {f m (R) = 1 and f m+1 (R) = 0} . (5) 

After some manipulation, it is possible to prove that exactly m messages, m = 0, 1, . . . , M, can 
be decoded if: 

C m _ i+ i H h C m > iR, i = 1, . . . , m, (6) 

C m +i + ■■■ + C m+l < iR, i = 1, . . . , M - m. (7) 

Then r](m) can be calculated as in Eqn. ® at the bottom of the page, where we have defined 
x + = max{0, x}, and /ci-c m (ci, . . . , c m ) as the joint pdf of C*i, . . . , C m , which is equal to 
the product of the marginal pdf's due to independence. The probability in Eqn. © cannot be 
easily evaluated for a generic M. However, we provide a much simpler way to calculate the 
average decoded rate Rje- The simplification of the average rate expression is valid not only 
for i.i.d. but also for conditionally i.i.d. channels. Random variables {Ci, • • • ,Cm} are said to 
be conditionally i.i.d. given a random variable U if the joint distribution is of the form 

/c 1 ,...,c Af ,[/(ci,--- ,c M ,u) = f Cl \u(ci\u) x ••• x fc M \u(cM\u)fu(u), (9) 

where 

fc z \u{c l \u) = f Cl \u{c l \u), VMe{l,...M}. (10) 



/•OO /*oo /*oo 

t)(m) =/ / ■■•/ fc 1 -c m {xi,...,x m )dxi'"dx r , 

JR J(2R~x m )+ J(mR-x m x 2 ) + 



i-R r 2R-x„_ 

■I I ■■ I fc m+1 ---c M { x m+i, ■ ■ ■ ,x M )dx m+1 ■ ■ ■ dx M (8) 
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Note that i.i.d. channels is a particular case of conditionally i.i.d. channels where U is a constant. 

Theorem 1: The average decoded rate for the JE scheme in the case of conditionally i.i.d. 

channel capacities is given by: 

- R M 

Rje= mH Pt "{Ci + ■ • • + Cm > mR}. (11) 

m=l 

Proof: See Appendix. 

In general it is still difficult to find an exact expression for Rje, but it is possible to show 
that Rj E approaches R for large M if C > R. To prove this, we rewrite Eqn. (fTTT ) as: 

- R M 

Rj E = R-jjJ2 a ™ (12) 

m=l 

where we have defined 

a m ±Pr{ Cl + --- + Cm < R ). (13) 

{ m J 

It is sufficient to prove that, if C > R, then lim^/^oo Y^ m =i a ™ = c ' f° r some < c < oo. We 

start by noting that lim m _j. +00 a m = 0, since, by the law of large numbers, Cl+ '" +Cm converges 

to a Gaussian random variable with mean C and variance — as m goes to infinity, o 2 c being the 

variance of the channel capacity. To prove the convergence of the series sum we show that 

lim «nm =A5 (14) 

with < A < 1. We define 

Q _ Ci + -+C m 

l m ± JUL ,m = l,2,...,M, (15) 

where each l„, is a random variable with zero mean and unit variance. From the central limit 
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theorem we can write: 



Prllm+i > 



C-R 



lim = lim =— p = ^-*- (Id) 

m-»+oo a m m— >+oo p I r . C-R 

Q 



C-R 



]im v^M^+T/ (1?) 



m->+oo ^ / C-R 



Q 



<T c /y/m+l 



C-R 



< lim F~S r= 7T ( 18 ) 



a c 2 + m(C - Rf 
hm 



ni+ 1 m 



(19) 



™->+°° ^(m + l)^ - R) 2 

(C-R) 2 

e ~^T~ < 1, (20) 



where inequality (1181) follows from the bounds on the Q-function: 



X _x^_ 1 _x^_ 

e 2 < Q(x) < — _ e 2 for x > O.B (21) 



{l+x 2 )y/2n ' xV2tc 

In a similar way, we prove that if C < R, then the average rate tends to zero asymptotically with 
M. To see this, we consider the series in Eqn. ([TO defining b m = Pr{Ci + ■ ■ ■ + C m > mR}. 
We want to prove that J2m=i ^m converges to zero. We first notice that lim m _> +00 b m = by the 
law of large numbers. Similarly to the above arguments, one can show that lim m _>. +00 J f ±1 = 0; 
and hence, R JE goes to zero as we increase the number of messages and the channel blocks. 
Overall we see that the average rate of the JE scheme shows a threshold behavior, i.e., we have: 



(22) 



Eqn. (|22l) indicates a phase transition such that R JE is zero even for large M if R > C and 
the transmission rate cannot be modified. However, the transmitter may choose to transmit only 
a fraction a = ^- < 1 of the messages, allocating the extra M — M' channel blocks to the 
M' messages, effectively controlling the transmission rate. In other words, the M' messages are 
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0, 


if R>C 
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encoded and transmitted as described in the first part of this section in M' channel blocks, while 
each of the remaining M — M' blocks is divided into M' equal parts, and the encoding process 
used for the first M' blocks is repeated, using independent codewords, across the M' parts of each 
block. For instance, let M = 3 and M' = 2. Then, Xi(Wi) and x 2 (Wi, W 2 ) are transmitted in 
the first and second channel blocks, respectively. The third channel block is divided into M' = 2 
equal parts and the independent codewords x 3 i(Wi) and x 32 (Wi, W2) are transmitted in the first 
and in the second half of the block, respectively. We call this variant of the JE scheme adaptive 
JE (aJE) scheme. The conditions for decoding exactly m messages, m = 0, 1, ... , M', in aJE 
can be obtained from those given in © and © by replacing C, with C* = Ci + jp J2j=m'+i Q> 
i 6 {1, . . . , M'}. Note that the random variables C*, i E {1, . . . , M'}, are conditionally i.i.d., i.e., 
they are i.i.d. once the variable U = jp Y^j=m'+i ^j is fixed. This implies that Theorem 1 holds. 
In the following we prove that the average decoded rate of the aJE scheme R a jE approaches aR 
for large M if C > aR. Similarly to the JE scheme, it is sufficient to prove that, if C > aR, 



Ma 
M— >oo 



lim > a* = c, (23) 



m=l 
—+C* 1 

for some < c < 00, where at, — Pr \ ^^Z, — — < R >. We can rewrite a* m as follows: 

Pr{ Cl + --- + Cm + ^=^ C > <R) (24) 

m 

Pr{ c_ 1± _ ± c^ + ( 1 ^ 1 ^ I Ci<R) (25) 

^ 1 j=Ma+l 

Pr { /„, > _£t^__ } (26) 




where 

C/a - Ci+~+c m _ (j 

r A _ m a M[l—a) • — -j— j-niA-r* ■> t r tn\ 

J_ 1 1-q 
ac Y m i " Ma 2 

is a random variable with zero mean and unit variance. Since m < M and by the law of large 
numbers applied to Eqn. (1261 ) we find \im. m ^ +00 a* m = 0, since l m converges to a Gaussian 
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random variable with zero mean and unit variance as m goes to infinity. First we show that 

lim (&) = d, (28) 

for some < d < +oo where we have defined: 



'L = Pr<L> C / a R } <29> 



/-' r < , 

m 

" V m ma z I 



and 

T">n Ci+-+C m (1-a) 1 ^Afa+m ^y 

,, A I m q- -m(l-q) Z^=Ma+l W 

t m — , (jU) 

<7c A / 2 

c y m, mar 

such that Z^ is a random variable with zero mean and unit variance. From Eqn. (|28l) we find 



Pr < l m > 



C/a-R 



Urn f^O = lim ^ ^V(AlM^ (31, 

rrw+oc \ gL 7 m->+oo f C/a-R 

Vv™ ma 2 , 







C/a-R 



Inn ^ c vU + Ji?)/ ( > 2) 



Q 



V V m ma 2 . 



C/a-R 



< lim — ; vvm m " 7 ; (33) 

Q 



m ^+°° ~ / C/a-R 



" v V m m, 



(34) 



where inequality (1331 follows from the fact that m < M and from the fact that Q(x) is 
monotonically decreasing in x. Then we show that 

Ma 

lim N d m = c", (35) 



M— >oo 

m=l 



for some < c" < +oo. To prove the convergence of the series sum we show that lim 



^m+l 



771— »+00 J 
1*77 
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A', for some < A' < 1. From the central limit theorem we can write: 



Pr { l m +i > 



C/a-R 



d. 
hm — 

m->+oo dr 



m+1 



lim 

m— >+oo 



"' t/ {^+T + (J+l)a 2 , 



Q 



Pr { L. > 



C/a-R 



C/a-R 



(36) 



lim 

m— >+oo 



. m+1 (m+l)ct 2 / 







C/a-R 



1 , 1 - a 
m ma 2 



(37) 



l+l (m+l)c 



< lim 

m— »+oo 



(C/<x-R)V2n 



C/a-R 



(m + l)a 2 J ) 



C/a-R 



; y(™ + ^b?. 



C/a-R 



1 , 1-n 



^7^ e 



C/a-R 



rc v(™+^) 



(38) 



i 



+ 



1-a 



lim 

™->-+oo (C/a - i?) 2 



<^v k 2 a + m + ic/a - r?] 



(C/a-R) 2 



— + 1 ~°i 

m ma 2 



(C/a-R) 2 



<1, 



^f39) 



(40) 



where inequality (1381) follows from from the bounds on the Q-function given in Eqn. (I2TI).B 

From Eqn. (1401 it follows that \im.M->oc R a jE — R if aR < C. Similarly, it can be easily 
shown that lim^^cx) R a jE = if aR > C. Thus by choosing a appropriately, we can have 



lim R aJE = min{i?, C}. 

M->oo 



(41) 



Eqn. (I4TT) suggests that the average transmission rate can be adapted at the message level while 
keeping a fixed rate at the physical layer. We will see in Section [IV] that the maximum average 
decoded rate cannot be above this value; hence, as the number of messages and the channel blocks 
go to infinity, the aJE scheme achieves the optimal performance. We will show in Section [V] 
through numerical analysis that near optimality of the aJE scheme is valid even for finite M. 
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However, we also note the threshold behavior of the performance of aJE; that is, when there are 
multiple users or inaccuracy in the channel statistics information at the transmitter, aJE performs 
very poorly for users whose average received SNR is below the target value. In the following we 
propose alternative transmission schemes with more gradual performance change with the SNR. 

B. Time-Sharing Transmission (TS) 

One of the resources that the encoder can allocate among different messages is the total 
number of channel uses within each channel block. While the whole first channel block has 
to be dedicated to message W\ (the only available message), the second channel block can be 
divided among the messages W\ and W 2 , and so on so forth. Assume that the encoder divides 
the channel block t into t portions ««, . . . , a tt such that an > and J2 i=l an — 1. In channel 
block t, a it n channel uses are allocated to message Wi. A constant power P is used throughout 
the block. Then the total amount of received mutual information (MI) relative to message Wi is 
Ii 0t = Ylit=i a uCt- Letting an — 1 if t — i and an = otherwise, we obtain the MT scheme. 

For simplicity, in the time sharing (TS) scheme we assume equal time allocation among all 
the available messages; that is, for i = 1, . . . , M, we have a it = \ for t = i, i + 1, . . . , M, 
and a it = for t = 1, . . . , i. The messages that arrive earlier are allocated more resources; and 
hence, are more likely to be decoded. We have I\ ot > Jj°' for 1 < i < j < M. Hence, the 
probability of decoding at least m messages is: 

q (m) = Pr{I™ > R}, for m = 0, 1, . . . , M, (42) 

where we define T^+i = an d ^o°* = °°- Then the average decoded rate is: 

R TS = Ky { )= Ry \^ Cn^ ^M y ) 

m=l m=l v ' 

C. Generalized Time-Sharing Transmission (gTS) 

In generalized time-sharing transmission each message is encoded with equal time allocation 
over W consecutive blocks as long as the total deadline of M channel blocks is not met. 
Messages from W\ to Wm-w+i are encoded over a window of W blocks, while messages Wi, 
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Fig. 4. Average decoded rate for the gTS scheme plotted against the window size W for M = 10 4 messages and R — 1 bpcu 
for two different average SNR values. 



for % e {M - W + 2, M - W + 3, . . . , M} are encoded over M - % + 1 blocks. In particular 
we focus on the effect of variable W on the average decoded rate R g rs- In case W ^C M and 
VT ^> 1, most of the messages are transmitted over W slots together with W — 1 other messages. 
In this case the MI accumulated for a generic message Wi is: 

1 i+W-l 

'•" = w E «■ 



ir 



(44) 



t=t 



By the law of large numbers, (|44|) converges in probability to the average channel capacity C as 
VF — >■ oo. Thus, we expect that, when the transmission rate R is above C, the gTS scheme shows 
poor performance for large W (and hence, large M), while almost all messages are received 
successfully if R < C. We confirm this by analyzing the effect of W on R numerically in Fig. 
@]for M = 10 4 and R = 1 bpcu. For P = dB the average channel capacity C is lower than 
R, which leads to a decreasing R g rs with increasing window size W. On the other hand, for 
P = 2 dB C is higher than R = 1 bpcu, and accordingly -R 5 ts approaches 1 as W increases. 

The same reasoning cannot be applied if the window size is of the order of the number of 
messages, as the number of initial messages which share the channel with less than W — 1 other 
messages and the number of final messages which share the channel with more than W — 1 
messages are no longer negligible with respect to M. In Fig. |5(a)[ we plot R 9 ts vs W for 
relatively small numbers of messages and C > R. As seen in the figure, for a given value of 
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(a) R = 1 bpcu (R < C). 



(b) R = 1 bpcu (R > C). 



Fig. 5. Average decoded rate for the gTS scheme plotted against the window size W for different values of M, P = 5 dB. 

M an optimal value of W can be chosen to maximize R 9 ts- The optimal value of W increases 
with M when R < C. We plot R 9 ts for C < i? in Fig. |5(b)| From the figure we see that R 9 ts 
decreases monotonically with W up to a minimum, after which it increases almost linearly. 
The initial decrease in the decoded rate is due to the averaging effect described above, while the 
following increase is due to the fact that messages which are transmitted earlier get an increasing 
amount of resources as W increases, and so the probability to be decoded increases. As a matter 
of fact, for each finite i, the average MI accumulated for message i grows indefinitely with W, 
i.e.: 

V — \ = lim C V - = +00. 

t=i ) t=i 

Thus, for a fixed i, letting W go to infinity leads to an infinite average MI, which translates 
into a higher R 9 ts- Note that this is valid only for relatively small i and large W, i.e., only 
messages transmitted earlier get advantage from increasing W, while the rest of the messages are 
penalized. For instance, if M > W, while message W\ is allocated a total of n Ylt=i 7 channel 
uses over W channel blocks, message Wm only receives a fraction -^ of a channel block. If W 
is small compared to M, as in the plot of Fig. |4]for P = dB, the fraction of messages which 
get advantage from the increasing W remains small compared to M; and hence, R 9 ts does not 
increase with W for the considered range. 

Note that the TS scheme in Section IIII-BI is a special case of the gTS scheme obtained by 
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letting W = M. On the other extreme, by letting W = 1, we obtain the MT scheme. 

Although the idea of encoding a message over a fraction of the available consecutive slots 
(e.g., W < M for message W% in gTS) can be applied to all the schemes considered in this 
paper, the analysis becomes quite cumbersome. Hence, we restrict our analysis to the TS scheme 
as explained above. 
D. Superposition Transmission (ST) 

Next we consider superposition transmission (ST), in which the transmitter transmits in channel 
block t, t 6 {1, . . . , M}, the superposition of t codewords, chosen from t independent Gaussian 
codebooks of size 2 nR , corresponding to the available messages {Wi, . . . , W t }. The codewords 
are scaled such that the average total transmit power in each block is P. In the first block, only 
information about message W\ is transmitted with average power P\\ = P; in the second block 
we divide the total power P among the two messages, allocating P 12 and P 22 for W\ and W 2 , 
respectively. In general, over channel block t we allocate an average power P it for Wi, while 

Y,i=i Pa = P- 

Let S be any subset of the set of messages M = {1, . . . , M}. We define C(S) as follows: 

This provides an upper bound on the total rate of messages in set S that can be decoded jointly 
at the user considering the codewords corresponding to the remaining messages as noise. The 
receiver first checks if any of the messages can be decoded alone by considering the other 
transmissions as noise. If a message can be decoded, the corresponding signal is subtracted and 
the process is repeated over the remaining signal. If no message can be decoded alone, then the 
receiver considers joint decoding of message pairs, followed by triplets, and so on so forth. This 
optimal decoding algorithm for superposition transmission is outlined in Algorithm [TJ below. The 
user calls the algorithm with Rate = and M. — {1, . . . , M} initially. 

While Algorithm [TJ gives us the maximum total rate, it is challenging in general to find a 
closed form expression for the average total rate, and optimize the power allocation. Hence, we 
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Algorithm 1 Total_Decoded_Rate (Rate, M, P) 



boolean Decoded = 
for i = 1 to \M\ do 

if iR < m&x S ;scM,\S\=i C{S) then 
Decoded = 1 
Rate = Rate + $ 
M =M\S 
quit for 
end if 
end for 
if (M ^ 0) AND (Decoded) then 

Total_Decoded_Rate (Rate, M, P) 
elsereturn Rate 
end if 



focus here on the special case of equal power allocation, where we divide the total average power 
P among all the available messages at each channel block. The performance of the ST scheme 
will be studied in Section IVl numerically and compared with the other transmission schemes and 
an upper bound which will be introduced next. 

IV. Upper Bound 

We provide an upper bound on the performance by assuming that the transmitter is informed 
about the exact channel realizations at the beginning of the transmission. This allows the 
transmitter to optimally allocate the resources among messages to maximize R. Assume that 
Ci, . . . , Cm are known by the transmitter and the maximum number of messages that can be 
decoded is m < M. We can always have the first m messages to be the successfully decoded 
ones by reordering. When the channel state is known at the transmitter, the first m messages 
can be decoded successfully if and only if ll22ll . 



iR < C m _ i+ i + C m _i +2 H h C M , for i = 1, . . . , m. 
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We can equivalently write these conditions as 



R < min 

ie{l,...,m} 



1 M 

m — i + 1 / — ' 



(46) 



Then, for each channel realization {h[l], . . . , h[M]}, the upper bound on the average decoded 
rate is given by ^ R, where m* is the greatest m value that satisfies (|46|) . This is an upper bound 
for each specific channel realization obtained by optimally allocating the resources. An upper 
bound on R can be obtained by averaging this over the distribution of the channel realizations. 
Another upper bound on R can be found from the ergodic capacity assuming all messages are 
available at the encoder at the beginning and letting M go to infinity. Thus, R can be bounded as: 



R< minj^C}. 



(47) 



The bound R < R follows naturally from the data arrival rate. Comparing (PTTT) and (BTI) we see 
that the aJE scheme achieves the optimal average decoded rate in the limit of infinite M. 

V. Numerical Results 

In this section we provide numerical results comparing the proposed transmission schemes. 
For the simulations we assume that the channel is Rayleigh fading, i.e., the channel state <p{t) is 
exponentially distributed with parameter 1, i.e., /$(0) = e - * for > 0, and zero otherwise. In 
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(a) P = 1.44 dB (C > R). 
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Fig. 6. The cumulative mass function (cmf) of the number of decoded messages for R — 1 bpcu and M — 50. 



Fig. |6(a)| the cumulative mass function (cmf) of the number of decoded messages is shown for the 
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(b) P = 2 dB (R < C). 



Fig. 7. Average number of decoded messages vs. the total number of messages M for R — 1 bpcu. 

different transmission techniques for R = 1, M = 50 and P = 1.44 dB, which corresponds to an 
outage probability of p = 0.5 for the MT scheme and an average channel capacity C c± 1.07 > R. 
We see that MT outperforms ST and TS schemes, as its cmf lays below the other two. On the 
other hand, the comparison with the JE scheme depends on the performance metric we choose. 
For instance, JE has the lowest probability to decode more than m messages, for m < 15, while 
it has the highest probability for m > 22. In Fig. |6(b)| the cmf' s for the case of P = dB are 
shown. In this case the average capacity is C ~ 0.86. Comparing Fig. |6(b)| and Fig. |6(a) 



we 



see how the cmf of the JE scheme has different behaviors depending on whether C is above or 
below R. We see from Fig. |6(b)| that for the JE scheme there is a probability of about 0.3 not 
to decode any message, while in all the other schemes such probability is zero. However, the 
JE scheme also has the highest probability to decode more than 30 messages. Furthermore, we 
note that the cmf of the gTS scheme converges to the cmf of TS scheme at low SNR. This is 
because, as shown in Section IIII-Cl when C < R, the optimal window size W is equal to M, 
which is nothing but the TS scheme. In the following, we focus on the average decoded rate 
as our performance metric. In Fig. |7(a)| and Fig. |7(b)| the average number of decoded messages 
is plotted against M for SNR values of —3 dB and 2 dB, respectively, and a message rate 
of R = 1 bpcu. While JE outperforms the other schemes at SNR = 2 dB, it has the poorest 
performance at SNR = —3 dB. This behavior is expected based on the threshold behavior of the 



DRAFT 



21 




Joint encoding (JE) 

—•—Adaptive JE (aJE) 
-•-Time sharing (TS) 
-i— Memory less (MT) 
—•—Superposition (ST) 
—*— Upper bound 
(R.C) 



4 6 

R (bpcu) 



Fig. 8. Average decoded rate vs R for P = 20 dB and M — 100 messages. The upper bound min(i?, C) is also shown. 



JE scheme that we have outlined in Section IIII-Al Note that the average capacity corresponding 
to SNR = -3 dB and 2 dB are C = 0.522 and C = 1.158, respectively. The former is below 
the target rate R = 1 and the receiver can not decode almost any message, whereas the latter is 
above R = 1, leading to an average decoded rate close to the optimal value. Note from the two 
figures that none of the schemes dominates the others at all SNR values. In Fig. [8] R is plotted 
against the transmission rate R for the case of M = 100 and P = 20 dB. The aJE scheme 
outperforms all the other schemes, performing very close to the upper bound. The number M' 
of messages transmitted in the aJE scheme is chosen so that ^ = 0.95^. In the figure we also 
show the upper bound obtained from the ergodic capacity mm(R,C). It can be seen how it 
closely approximates the informed transmitter upper bound for R < 6. The JE scheme performs 
better than the others up to a certain transmission rate, beyond which rapidly becomes the worst 
one. This is due to the phase transition behavior observed here even for a relatively small M. 
Among the other schemes, MT achieves the highest average decoded rate in the region R < 6.8, 
while TS has the worst performance. The opposite is true in the region R > 6.8, where the curve 
of ST scheme is upper and lower bounded by the curves of the MT and TS schemes. We have 
repeated the simulations with different parameters (i.e., changing P and M ) with similar results, 
that is, MT, TS, and ST schemes meet approximately at the same point, below which MT has the 
best performance of the three while above the intersection TS has the best performance. At the 
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Fig. 9. Average decoded rate R vs distance from the transmitter for R — 1 bpcu, M — 100, P = 20 dB and a = 3. 

moment we have no analytical explanation for this observation, which would mean that there is 
always a scheme outperforming ST. We next study the performance of the considered schemes 
as a function of the distance from the transmitter. We scale the average received power at the 
receiver with d~ a , where d is the distance from the transmitter to the receiver and a is the path 
loss exponent. The results are shown in Fig. [9] for P = 20 dB, M = 100, R — 1 bpcu and a 
path loss exponent a = 3. The dependence of R on the distance is important, for instance, in the 
context of broadcast transmission in cellular networks, in which case the receiving terminals may 
have different distances from the transmitter. In such a scenario the range of the average channel 
SNR values at the receivers becomes important, and the transmitter should use a transmission 
scheme that performs well over this range. For instance, in a system in which all users have the 
same average SNR, which is the case for a narrow-beam satellite system where the SNR within 
the beam footprint has variations of at most a few dB's on average fl2T1 . the transmission scheme 
should perform well around the average SNR of the beam. A similar situation may occur in a 
microcell, where the relatively small radius of the cell implies a limited variation in the average 
SNR range experienced by the users at different distances from the transmitter. Instead, in the 
case of a macrocell, in which the received SNR may vary significantly from the proximity of the 
transmitter to the edge of the cell, the transmitter should adopt a scheme which performs well 
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over a larger range of SNR values. In the range up to d = 4 the JE scheme achieves the highest 
average decoded rate while for d > 6 the TS scheme outperforms the others. The drop in the 
decoded rate in the JE scheme when passing from d = 4 to d = 5 is similar to what we observe 
in Fig. [8] when the rate increases beyond R = 6 bpcu. In both cases the transition takes place as 
the transmission rate surpasses the average channel capacity. The aJE scheme, which selects the 
fraction of messages to transmit based on C, outperforms all other schemes and gets relatively 
close to the informed transmitter upper bound and the ergodic capacity. The aJE scheme adapts 
the average transmission rate at message level to the average channel capacity. We recall that, 
in the aJE scheme, the transmitter only has a statistical knowledge of the channel, and yet gets 
pretty close to the performance of a genie-aided transmitter even for a reasonably low number 
of channel blocks. We further notice how the adaptive JE scheme closely approaches the ergodic 
capacity, even though data arrives gradually at the transmitter during the transmission, instead 
of being available at the beginning, which is generally assumed for the achievability of the 
ergodic capacity [6J. We should note that in Fig. [8] the average transmission rate is optimized 
for each given distance for the aJE scheme, while such optimization is not done for the other 
schemes. Thus, in case two (or more) terminals have different distances from the transmitter, the 
optimization can no longer be performed and a tradeoff between the average decoded rates of the 
two nodes would be needed. The performance can be improved by considering a combination 
of the aJE scheme with the TS or ST schemes. The plots in Fig. [8] show how TS, MT and ST 
schemes are more robust compared to the JE scheme, as their average decoded rate decreases 
smoothly with the distance, unlike the JE scheme, which has a sudden drop. 

VI. Conclusions 

We have considered a transmitter streaming data to a receiver over a block fading channel, 
such that the transmitter is provided with an independent message at a fixed rate at the beginning 
of each channel block. We have used the average decoded rate as our performance metric. We 
have proposed several new transmission schemes based on joint encoding, time-division and 
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superposition encoding. A general upper bound on the average decoded rate has also been 
introduced assuming the availability of CSI at the transmitter. 

We have shown analytically that the joint encoding (JE) scheme has a threshold behavior and 
performs well when the target rate is below the average channel capacity C, while its performance 
drops sharply when the target rate surpasses C. To adapt to an average channel capacity that is 
below the fixed message rate R, the adaptive joint encoding (aJE) scheme transmits only some 
of the messages. We have proved analytically that the aJE scheme is asymptotically optimal as 
the number of channel blocks goes to infinity, even though data arrives gradually over time at 
a fixed rate, rather than being available initially. We have also shown numerically that, even for 
a finite number of messages, the aJE scheme outperforms other schemes in all the considered 
settings and performs close to the upper bound. 

We have also proposed the time-sharing (TS) and superposition transmission (ST) schemes, 
as well as a generalized TS scheme which transmits each message over a certain number of 
channel blocks. While none of these schemes outperform others at all settings, their performances 
degrade gradually with the decreasing average SNR as opposed to the threshold behavior of the 
JE scheme. This provides robustness in the case of multiple receivers with different average 
SNRs or when the channel statistics information at the transmitter is not accurate. 

Appendix 
A. Proof of Theorem 1 

Let B k denote the event "the first k messages can be decoded at the end of channel block k", 
while B k denotes the complementary event. The event B k holds if and only if 

C k . l+1 + C k _ l+2 + --- + C k >iR (48) 

is satisfied for alii = 1, . . . , k. Let E k j denote the event "the j-th inequality needed to decode 
the first k messages in k channel blocks is satisfied", that is: 

E ktj ± {C k „ j+l + --- + C k > jR}, for j = 1, ... , k, (49) 
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while Ekj denotes the complementary event. 

Note that in the JE scheme if m messages are decoded these are the first m messages. Let 
n d denote the number of decoded messages at the end of channel block M. Then the average 
decoded rate is 

R JE = R [Pr{n d > 1} + Pr{n d > 2} + • ■ ■ + Yi{n d > M - 1} + Pr{n d > M}} . (50) 

The A;-th term in the sum of Eqn. (1501 ) is the probability of decoding at least k (i.e. k or more) 
messages. Each term in (l50l can be expressed as the sum of two terms as: 



Pr{n d >k} = Pr{B k , n d >k} + Pr{B k , n d > k} (51) 



The first term of the sum in (1511 is the probability of "decoding k messages at the end of 
channel block k and decoding at least k messages at the end of M channel blocks". Note that 
this corresponds to event B k , since if B k holds, the event "decode at least k messages at the 
end of channel block M" is satisfied. We have: 

Pr{B k , n d >k} = Pr{B k } = Pr{E k>1 , • • ■ , E k , k }. (52) 

As for the second term of the sum in (ISTT ). it is the probability of decoding at least k messages 
but not k at the end of channel block k. It can be further decomposed into the sum of two 
terms, one corresponding to the probability of decoding and the other to the probability of not 
decoding k + 1 messages at the end of block k + 1 while decoding more than k messages in M 
blocks, i.e.: 

Pr{B k ,n d >k} = Pr{B k ,B k+u n d > k} + Pr{B k ,B k+1 ,n d > k}. (53) 

Looking at the first term, similarly as seen before, the event n d > k is true if the condition B k+1 
is satisfied (i.e., if k + 1 messages are decoded at the end of block k + 1, then more than k 
messages are decoded at the end of channel block M), that is: 

Pr{B k , B k+1 , n d >k} = Pr{B k , B k+1 }. 
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Plugging these into (I5TI) . we obtain 

Pr{n d >k} = Pr{B k } + Pr{B k ,B k+1 } + Pr{B k ,B k+1 ,n d > k}. (54) 

We can continue in a similar fashion, so that, in general the event "at least k messages are 
decoded" can be written as the union of the disjoint events ("k messages are decoded in k 
slots") [J ("k messages are not decoded in k slots but k + 1 messages are decoded in k + 1 
slots") IJ ' ' ' U (" no message can be decoded before slot M but M messages are decoded 
in slot M"). Hence, by the law of total probability, the probability of decoding more than k 
messages can be written as: 

M 

Pr{n d >k} = Y,Pr{B k ,B k+1 , ■ ■ ■ ,Bj_ u B 3 }. (55) 

j=k 

Note that each term of the sum in (l55l says nothing about what happens to messages beyond 



the j'-th, which can either be decoded or not. Plugging (1551) in (|50l) we find: 

M M M 

E[m] = J2 Pr i n d > k } = J2J2 Pr t E k,B k+1 ,--- ,S i _ 1 ,S i } 

fc=l fc=l j=k 

M j 

= X)Z) Pr {^^fc4-ir-- iBj-uBj}. (56) 

j=l fc=l 

We can rewrite each of these events as the intersection of events of the kind E k>i and E kti . 
Each term of the sum in (|56l) can be split in the sum of the probabilities of two disjoint events: 

Pr{B k , -Bfc+i, • • ■ , -Bj-i, Bj} = Pr{E kt i, B k , B k+ x, ■ • • , -Bj-i, Bj} 

+Pr(E k>1 ,B k , B k+1 , ■■■ , B _ u B }. (57) 

As the event E k>1 implies the event B k , this can be removed from the second term in the right 
hand side of (|57l) . Note that, in general, the event E k>i , i 6 {1, • • • , k} implies the event B k . In 
order to remove the event B k from the first term as well, we write it as the sum of probabilities 
of two disjoint events: one intersecting with E k $ and the other with E kj2 . Then we get: 
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Pr{B k , B k+1 , • • ■ , £j_i, Bj} = Pr{E k ,i, E kt2 , B k , ■ ■ • , -Bj-i, -Bj} 

+Pr{E k ^E k , 2 ,B k , ■ ■ • ,S i - 1)J B i } (58) 

+Pr{£' fcil , -Bfe+i, • ■ • , Bj_i,Bj}. 

Now £> fc can be removed from the second term of the sum thanks to the presence of E k>2 . Each 
of the terms in the right hand side of (|58l ) can be further written as the sum of the probabilities 
of two disjoint events and so on so forth. The process is iterated until all the Bd, d < j events 
are eliminated and we are left with events that are intersections of only events of the type E p>q 
and E Piq , for some p, q E {k, k + 1, . . . , M} and Bj. The iteration is done as follows: 

For each term of the summation, we take the Bi event with the lowest index. If any E L j 
event is present, then B\ can be eliminated. If not, we write the term as the sum of the two 
probabilities corresponding to the events which are the intersections of the B\ event with E l:d+ i 
and Ei td+ i, respectively, where d is the highest index j among the events in which E[ j is already 
present. The iterative process stops when I = j. 

At the end of the process all the probabilities involving events B k , . . . , -£>j_i will be removed 
and replaced by sequences of the kind: 

{E kt i, E k ,2, ■ ■ ■ , E kt i k , E k+1)ik+1 , • • • , Ek+i,i k+1 , • • • , Ej^i tij _ 2+ i, Ej_ 1)ijl ,Bj}, 

where ij_i 6 {j — 1 — k, ■ ■ ■ , j — 1} is the index corresponding to the last inequality needed to 
decode j — I messages which is not satisfied. Note that exactly one E^ r event for each B\ is 
present after the iteration. 

In order to guarantee that Bj holds, all the events Ej^, . . . , Ejj must be verified. It is easy 
to show that, after the iterative process used to remove the Bi's, the event Ej^._ 1+ \ ensures that 
all the events needed for Bj with indices lower than or equal to ij_i are automatically verified. 
Thus, we can add the events {Ej ti ._ 1+ i, ■ ■ ■ ,Ejj} to guarantee that Bj holds, and remove it 
from the list. It is important to notice that the term Ejj is always present. At this point we 
are left with the sum of probabilities of events, which we call E-events, each of which is the 
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intersection of events of the form E it j and E it j. Thus, an .E-event S k has the following form: 

S k — \E kt i, E k>2 , ■ ■ ■ , E k<ik , E k+ljik+1 , ■ ■ ■ , E k+1<ik+1 , ■ ■ ■ , Ej_i }ij _ 2+1 , Ej_ 1<i ._ 1 , Ej tij _ 1+ i, ■ ■ ■ , Ej^j. (59) 



By construction, the number of E-events for the generic term j of the sum in (1561) is equal to 
the number of possible dispositions of j — k £"s over j — 1 positions. As the number of events 
of type E is different for the ^-events of different terms in (|56l ), the ^-events relative to two 
different terms of (l56l are different. We define Sj as the set of all E-events which contain the 
event Ejj. The elements of <Sj correspond to all the possible ways in which j messages can be 
decoded at the end of block number j. The cardinality of Sj is equal to: 

which is the number of all possible combinations of j — 1 elements each of which can take value 



E or E. Now we want to prove that 



J2 Pr{S{} = Pr{E jd }. (61) 



SieSj 



Note that E k /s correspond to different events if the index k is different, even for the same 
index /; thus, the law of total probability can not be directly applied to prove (|6TI) . However, 
the following can be easily verified: Pr{E kl j} = Pr{E k2 i}, \/ki,k 2 . This implies that the 
probabilities of two ^-events which differ in some or all of the k indices (but not in the / 
indices) of its constituent events are the same. A proof is given in the following. 

Proposition 1: Let us consider a set of random variables C\, ■ ■ ■ ,Cj that are conditionally 
i.i.d. given U. Given any two ordering vectors i = zi, z 2 , • • • ,ij and 1 = h,h,- " > (?> we nave 

Pr{C h ^R,...,C il + --- + C ij ^ jR} = Pr{C h ^ R, ■ ■ ■ , C h + ■ ■ ■ + Q. ^ jR}, (62) 

Proof: The left hand side of Eqn. (l62l) can be rewritten as: 

r+oc r6t p r6f 

Pr{C n ^R,...,C il +--- + C i .^ jR} = du dc h ... dCi.f Ci \u( c i\ u )fu(u)A63) 

J -oo Je{ ow Je low 

where Q = C ix , • • • , C\. and Ci = c ix , • • • , q . , while Q\° w and 9^ p are the lower and upper 



extremes of the integration interval. 9™ is either equal to — oo or to hR — c ix — ■ ■ ■ — c. 



«h-i' 
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V7i 6 {1, . . . , j}, depending on whether there is a < or a > in the h-th inequality within brackets 
in Eqn. (1631 ), respectively, while 9^ p is either equal to hR — c^ —■■ ■ — c ih _ 1 or to +oo depending 
on whether there is a < or a > in the h-th inequality of Eqn. (1631) . respectively. By plugging 
Eqn © into Eqn (1631) we can write: 



»+00 n6l V 

Pr{C h ^R,...,C h -\ vCi. ^ jR} = I dufu(u) I dc h ... I dci.f Ci \u(ci\u) 



CO 



low 

1 



j up 

i 



+oo 



,up «p 



dufu(u) dc h ... dc i .fc il \u(ci 1 \u)---fc i \u(c i .\u). (64) 

Finally, by using Eqn. (flOl) in Eqn. (|64|) we find: 



low 
3 



Q up qV,P 



Pr{C h ^R,...,C h -\ VCi. ^ jR} = /_ * dufuiu) / ,*„, dc h . . . U^ dci.f Ci \u{ci 



u 



= L™ du fu(u) LL dc h . . . j 'l w dQ /cijt/CciJu) x ■■• x /^.^(qJm) 

1 j J 

= /- ~ dufu(u) f el l w dc h . . . f e L dciJc^uMu) x • ■ ■ x /^.^(qJm) 

= Pr{C, 1 ^^.-.,C, 1 + --- + C {j ^ ji2}." (65) 

The proposition above guarantees that, although these events do not partition the whole proba- 
bility space of Ejj, their probabilities add up to that of Ejj, i.e.: 

2J-1 

J2 Pr{S{} = Pr{E h3 } = PriC, + • • • + C 3 > jR}. (66) 

fe=i 

Finally, plugging Eqn. (|66l) into Eqn. (l56l) we can write: 

M M j 

E[m] = Y,Pr{n d >k} = ^ ^Pr{B k ,B k+l , ■ ■ - ,£,_!, Bj} 

k=l j=l k=l 

M M 

= E E Pr ^> = E P HCi + • - • + Cj > jR}M (67) 

3=1 sieSj o=i 
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